#######################################################################################################################
# Do Local Roots Impact Washington Behaviors? District Connections and Representation in the U.S. Congress ############
# Jesse Crosson and Jaclyn Kaslovsky                                                                       ############
# American Political Science Review                                                                        ############
# README File                                                                                              ############
#######################################################################################################################

This file outlines the software, code, and data used to produce the results 
presented in "Do Local Roots Impact Washington Behaviors? District Connections and Representation in the U.S. Congress," by Jesse Crosson and Jaclyn Kaslovsky.
In order to replicate Table 1 from the main text and Table I.1, replicators should run "Multinomial Logits.do"
In order to replicate all other tables from the main text, replicators should run "Main Analysis Code.R"
In order to replicate all tables from the supplementary materials, replicators should open and run "Appendix Code.R".


##########################################################################################
	## Scripts ##
##########################################################################################

# Final Outputs 

	"Main Analysis Code.R"
		loads in final datasets, runs regressions, and outputs tables 2 through 6 in the main text.

	"Appendix Regressions.R"
		loads in final datasets, runs regressions, and outputs all tables in the appendix text.

	"Multinomial Logits.do"
		loads in the data and produces Table 1 in the main text and I.6 in the Appendix.


# Data Creation

- Note that to run many of these files, one should download all shapefiles from the “Digital Boundary Definitions of United States Congressional Districts, 1789-2012" at http://cdmaps.polisci.ucla.edu, and put the zip files in a folder called "Shapefiles". Shapefiles for the 115th and 116th can be downloaded from https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.2017.html#list-tab-1556094155.
- Further, to run the cosponsorship file, one should download the following:
    * ProPublica's U.S. Congress bulk data (available at https://www.propublica.org/datastore/dataset/congressional-data-bulk-legislation-bills)
    * legislators-historical.csv & legislators-current.csv from https://github.com/unitedstates/congress
    * Fowler's (2006) cosponsorship and GovTrack data, from https://web.archive.org/web/20210414134647/http://jhfowler.ucsd.edu/cosponsorship.htm. Note that these data sources are in zip files, which include several other files necessary to run our cosponsorship code.

	"DataCreation_01_ExtractHometowns.R" 
		creates the original list of legislator hometowns 

	"DataCreation_02_OverlayingDistricts.R" 
		overlays the list of hometowns on top of shapefiles of congressional districts

	"DataCreation_03_CalculatingDistance.R"
		calculates the distance between the legislator's hometown and the boundary of their congressional district

	"DataCreation_04_CreatingPartyUnity.R"
		creates a dataset of party unity scores

	"DataCreation_05_CosponBipartisanshipMeasure.R"
		creates a dataset of how frequently members cosponsor with members of the opposite party 
	
	"DataCreation_06_FinalDataCombination.R"
		aggregates all datasets together and makes our final output
		

##########################################################################################	
	## Data ##
##########################################################################################

# Final Outputs 

	"Dataset_ForMainAnalysis.dta": legislator-congress level data from 1973 to 2020 used in main analyses

	"Dataset_OvertimeComparisons.dta": legislator-congress level data from 1889 to 2020 used for descriptive purposes

	
Additional information about the variables included in the datasets can be found in the accompanying codebook. 

# Data Creation
# Note that in the code, these datasets are assumed to be in a file called "Intermediate Data"

	"aip_cd_ideology_v2022a.RData": data on congressional district ideology from Warshaw, Christopher, and Chris Tausanovitch. 2022. “Subnational ideology and presidential vote estimates (v2022).”. URL: https://doi.org/10.7910/DVN/BQKU4M

	"allCongressDataPublishV2.csv": data on congressional district demographics from Foster-Molina, Ella. 2017. “Historical Congressional Legislation and District Demographics 1972-2014.” URL: https://doi.org/10.7910/DVN/CI2EPI

	"cel_cw.csv": more limited version of Volden and Wiseman's (2014) updated dataset, available at thelawmakers.org

	"CELHouse93to116Reduced.dta": data on legislative effectiveness and member traits from Volden, Craig, and Alan E Wiseman. 2020. “Legislative Effectiveness Senate Data
	from 1973-2018 [Computer file].”. The Center for Effective Lawmaking [distributor]. https//thelawmakers.org/data-download.
	 
	"Census_MobilityData.RData": dataset created by downloading American Community 1-year survey variables from Social Explorer.

	"census_variables_complete.dta": congressional district demographic variables to fill out the final years not covered by Foster-Molina (2017). Downloaded from Social Explorer.

	"congicpsrstyle.csv": legislator 'type' data, taken from Bernhard and Sulkin's legislative style book. Data provided directly by the authors.
	
	"cu.data.1.AllItems.txt": inflation adjustment dataset from the Bureau of Labor Statistics at https://download.bls.gov/pub/time.series/cu/cu.data.1.AllItems

	"dime_recipients_1979_2018_winners.csv": Bonica’s DIME dataset, available at https://data.stanford.edu/dime, subsetted to include only winners of general elections.

	"govtrack_cosponsor_data_109_congress.csv": GovTrack cosponsorship data, compiled by James Fowler

	"govtrack_cosponsor_data_110_congress.csv": GovTrack cosponsorship data, compiled by James Fowler

	"govtrack_cosponsor_data_111_congress.csv": GovTrack cosponsorship data, compiled by James Fowler

	"H114_members.csv","H115_members.csv","H116_members.csv": data on members from Lewis, Jeffrey B., Keith Poole, Howard Rosenthal, Adam Boche, Aaron Rudkin, and Luke Sonnet (2024). Voteview: Congressional Roll-Call Votes Database. https://voteview.com/

	"H114_votes.csv","H115_votes.csv","H116_votes.csv": data on voting from Lewis, Jeffrey B., Keith Poole, Howard Rosenthal, Adam Boche, Aaron Rudkin, and Luke Sonnet (2024). Voteview: Congressional Roll-Call Votes Database. https://voteview.com/

	"Hall_members.csv": data on House member ideology scores from Lewis, Jeffrey B., Keith Poole, Howard Rosenthal, Adam Boche, Aaron Rudkin, and Luke Sonnet (2024). Voteview: Congressional Roll-Call Votes Database. https://voteview.com/

	"hfa_replication.dta": data on local roots from Hunt, Charles R. 2022. Home Field Advantage: Roots, Reelection, and Representation in the Modern Congress. Ann Arbor, MI: University of Michigan Press.

	"House_Party_Unity_35-113.xls": data on party unity from Voteview. (n.d.). Party Unity Scores for Democrat and Republican Members of Congresses 35 - 113 (1857 - 2014). Party Unity Scores for Members Page. https://legacy.voteview.com/Party_Unity.htm 

	"HSall_members.csv": data on all House members and Senators from Lewis, Jeffrey B., Keith Poole, Howard Rosenthal, Adam Boche, Aaron Rudkin, and Luke Sonnet (2024). Voteview: Congressional Roll-Call Votes Database. https://voteview.com/

	"icpsrs_tofix.csv": a list of legislators with corrected hometowns

	"legislators_historical.csv" &  "legislators_current.csv": from https://github.com/unitedstates/congress, these files contain biographical information and a crosswalk for unique identifiers like thomasIDs, bioguideIDs, and ICPSR IDs.

	"Member_Staff_Data_hill_1212020.csv": data on House staffing patterns from Crosson, Jesse M, Alexander C Furnas, Timothy Lapira, and Casey Burgat. 2021. “Partisan Competition and The Decline in Legislative Capacity Among Congressional Offices.” Legislative Studies Quarterly 46 (3): 745–789. 

	"missing_CW.csv" hand-created cross-walk for ICPSR IDs that were missing BioguideIDs.

	"party.txt": from Fowler's (2006) cosponsorship data, this provides the party ID information necessary to calculate our bipartisanship-in-cosponsorship measure. 

	"party_bios_109-111.csv": same as party.txt, but for Fowler's govtrack years.
	
	"presidential_voteshare.dta": presidential vote share data compiled and generously shared by Gary Jacobson

	"states.csv": dataset of state names
	


##########################################################################################	
## Software Dependencies for Final Output Files ##
##########################################################################################	

All analyses were originally undertaken in R version 4.2.3 "Shortstop Beagle" on a system running macOS Catalina 10.15.7

Package dependencies in Main Regressions File: 
[1] readstata13_0.10.1     lfe_2.9-0              Matrix_1.6-0          
[4] cowplot_1.1.1          scales_1.3.0           forcats_1.0.0         
[7] ggpubr_0.6.0           ggplot2_3.4.4          starpolishr_0.0.0.9007
[10] stargazer_5.2.3        dplyr_1.1.4   

Packages dependencies in Appendix File:
[1] readstata13_0.10.1     lme4_1.1-34            memisc_0.99.31.6      
[4] MASS_7.3-58.2          lattice_0.20-45        broom_1.0.5           
[7] starpolishr_0.0.0.9007 stargazer_5.2.3        lfe_2.9-0             
[10] Matrix_1.6-0           haven_2.5.3            lemon_0.4.6           
[13] ggpubr_0.6.0           tidyr_1.3.0            ggridges_0.5.4        
[16] gridExtra_2.3          cowplot_1.1.1          qwraps2_0.5.2         
[19] doBy_4.6.17            ggthemes_4.2.4         ggplot2_3.4.4         
[22] dplyr_1.1.4 
		
		